09. Serialization

Beyond accessing model attributes directly via their field names (e.g. model.foobar), models can be converted, dumped, serialized, and exported in a number of ways.

Pydantic uses the terms "serialize" and "dump" interchangeably. Both refer to the process of converting a model to a dictionary or JSON-encoded string. Pydantic 将 serialize 和 dump 视为同义词，都指将 model 转换成字段或者 json 格式字符串的过程

`model.model_dump(...)`

This is the primary way of converting a model to a dictionary. Sub-models will be recursively converted to dictionaries.

The one exception to sub-models being converted to dictionaries is that RootModel and its subclasses will have the root field value dumped directly, without a wrapping dictionary. This is also done recursively.

from typing import Any, List, Optional
from pydantic import BaseModel, Field, Json

class BarModel(BaseModel):
    whatever: int

class FooBarModel(BaseModel):
    banana: Optional[float] = 1.1
    foo: str = Field(serialization_alias='foo_alias')
    bar: BarModel

m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

# returns a dictionary:
print(m.model_dump())
#> {'banana': 3.14, 'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(include={'foo', 'bar'}))
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(exclude={'foo', 'bar'}))
#> {'banana': 3.14}
print(m.model_dump(by_alias=True))
#> {'banana': 3.14, 'foo_alias': 'hello', 'bar': {'whatever': 123}}
print(
    FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
        exclude_unset=True
    )
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
    FooBarModel(banana=1.1, foo='hello', bar={'whatever': 123}).model_dump(
        exclude_defaults=True
    )
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
    FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
        exclude_defaults=True
    )
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
    FooBarModel(banana=None, foo='hello', bar={'whatever': 123}).model_dump(
        exclude_none=True
    )
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}


class Model(BaseModel):
    x: List[Json[Any]]


print(Model(x=['{"a": 1}', '[1, 2]']).model_dump())
#> {'x': [{'a': 1}, [1, 2]]}
print(Model(x=['{"a": 1}', '[1, 2]']).model_dump(round_trip=True))
#> {'x': ['{"a":1}', '[1,2]']}

`model.model_dump_json(...)`

The .model_dump_json() method serializes a model directly to a JSON-encoded string that is equivalent to the result produced by .model_dump().

from datetime import datetime
from pydantic import BaseModel

class BarModel(BaseModel):
    whatever: int

class FooBarModel(BaseModel):
    foo: datetime
    bar: BarModel

m = FooBarModel(foo=datetime(2032, 6, 1, 12, 13, 14), bar={'whatever': 123})
print(m.model_dump_json())
#> {"foo":"2032-06-01T12:13:14","bar":{"whatever":123}}
print(m.model_dump_json(indent=2))
"""
{
  "foo": "2032-06-01T12:13:14",
  "bar": {
    "whatever": 123
  }
}
"""

常用参数

BaseModel - Pydantic

Name	Type	Description	Default
`indent`	int \| None `model_dump_json 特有`	JSON 输出的缩进。若是 None，则默认为紧凑模式	`None`
`mode`	Literal['json', 'python'] \| str `model_dump 特有`	`to_python` 应该运行的模式。如果是‘ JSON’，则输出将只包含 JSON 可序列化类型。如果是‘ Python’，则输出可能包含 JSON 不可序列化的 Python 对象。	`python`
`include`	IncEx	Field(s) to include in the JSON output.	`None`
`exclude`	IncEx	Field(s) to exclude from the JSON output.	`None`
`context`	dict[str, Any] \| None\|	传给 serializer 的上下文	`None`
`by_alias`	bool	Whether to serialize using field aliases.	`False`
`exclude_unset`	bool	是否过滤掉那些没有被显式赋值的字段	`False`
`exclude_defaults`	bool	是否过滤掉那些值等于其默认值的字段	`False`
`exclude_none`	bool	是否过滤掉那些值等于 `None` 的字段	`False`
`round_trip`	bool	如果设置为 True，转储的值应该是非幂等类型（如 `Json[T]`）的有效输入。 If True, dumped values should be valid as input for non-idempotent types such as `Json[T]`.	`False`
`warnings`	bool \| Literal['none', 'warn', 'error']	如何处理序列化时的报错。False/"none" ignores them, True/"warn" logs errors, "error" raises a `PydanticSerializationError`.	`True`
`serialize_as_any`	bool	Whether to serialize fields with duck-typing serialization behavior.	`False`

`dict(model)` 与迭代

Pydantic models 还能够用 dict(models) 方式转成 dict，不过这不是一个递归的行为，so sub-models will not be converted to dictionaries.

可以使用 for field_name, field_value in model: 的方式去迭代 model

from pydantic import BaseModel


class BarModel(BaseModel):
    whatever: int

class FooBarModel(BaseModel):
    banana: float
    foo: str
    bar: BarModel

m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(dict(m))
#> {'banana': 3.14, 'foo': 'hello', 'bar': BarModel(whatever=123)}
for name, value in m:
    print(f'{name}: {value}')
    #> banana: 3.14
    #> foo: hello
    #> bar: whatever=123

Note also that RootModel does get converted to a dictionary with the key 'root'.

自定义序列化行为

Pydantic provides several functional serializers to customise how a model is serialized to a dictionary or JSON.

使用 @field_serializer 装饰器来改变某个字段的序列化行为，使用 @model_serializer 装饰器来改变整个 model 的序列化行为

from datetime import datetime, timedelta, timezone
from typing import Any, Dict

from pydantic import BaseModel, ConfigDict, field_serializer, model_serializer


class WithCustomEncoders(BaseModel):
    model_config = ConfigDict(ser_json_timedelta='iso8601')

    dt: datetime
    diff: timedelta

    @field_serializer('dt')
    def serialize_dt(self, dt: datetime, _info):
        return dt.timestamp()


m = WithCustomEncoders(
    dt=datetime(2032, 6, 1, tzinfo=timezone.utc), diff=timedelta(hours=100)
)
print(m.model_dump_json())
#> {"dt":1969660800.0,"diff":"P4DT4H"}


class Model(BaseModel):
    x: str

    @model_serializer
    def ser_model(self) -> Dict[str, Any]:
        return {'x': f'serialized {self.x}'}


print(Model(x='test value').model_dump_json())
#> {"x":"serialized test value"}

A single serializer can also be called on all fields by passing the special value '*' to the @field_serializer decorator.

In addition, PlainSerializer and WrapSerializer enable you to use a function to modify the output of serialization.

Both serializers 都能接受两个可选字段:

return_type specifies the return type for the function. If omitted it will be inferred from the type annotation.
when_used 指定此序列化器何时会被使用. 可以是 'always', 'unless-none', 'json' 或 'json-unless-none'. Defaults to 'always'.

PlainSerializer 使用一个简单的函数去改变字段序列化的输出

from typing_extensions import Annotated
from pydantic import BaseModel
from pydantic.functional_serializers import PlainSerializer

FancyInt = Annotated[
    int, PlainSerializer(lambda x: f'{x:,}', return_type=str, when_used='json')
]


class MyModel(BaseModel):
    x: FancyInt


print(MyModel(x=1234).model_dump())
#> {'x': 1234}

print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,234'}

WrapSerializer receives the raw inputs along with a handler function that applies the standard serialization logic, and can modify the resulting value before returning it as the final output of serialization.

from typing import Any

from typing_extensions import Annotated

from pydantic import BaseModel, SerializerFunctionWrapHandler
from pydantic.functional_serializers import WrapSerializer

def ser_wrap(v: Any, nxt: SerializerFunctionWrapHandler) -> str:
    return f'{nxt(v + 1):,}'

FancyInt = Annotated[int, WrapSerializer(ser_wrap, when_used='json')]

class MyModel(BaseModel):
    x: FancyInt

print(MyModel(x=1234).model_dump())
#> {'x': 1234}

print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,235'}

篡改 model_dump 的返回值类型

@model_serializer 能够篡改 .model_dump() 的返回值类型（通常是 dict[str, Any]）

from pydantic import BaseModel, model_serializer

class Model(BaseModel):
    x: str

    @model_serializer
    def ser_model(self) -> str:
        return self.x

print(Model(x='not a dict').model_dump())
#> not a dict

If you want to do this and still get proper type-checking for this method, you can override .model_dump() in an if TYPE_CHECKING: block:

from typing import TYPE_CHECKING, Any

from typing_extensions import Literal

from pydantic import BaseModel, model_serializer


class Model(BaseModel):
    x: str

    @model_serializer
    def ser_model(self) -> str:
        return self.x

    if TYPE_CHECKING:
        # Ensure type checkers see the correct return type
        def model_dump(
            self,
            *,
            mode: Literal['json', 'python'] | str = 'python',
            include: Any = None,
            exclude: Any = None,
            by_alias: bool = False,
            exclude_unset: bool = False,
            exclude_defaults: bool = False,
            exclude_none: bool = False,
            round_trip: bool = False,
            warnings: bool = True,
        ) -> str:
            ...

This trick is actually used in RootModel for precisely this purpose.

子类的序列化

标准类型的子类

标准类型的子类会像它们的基类一样被 dump

from datetime import date, timedelta
from typing import Any, Type

from pydantic_core import core_schema

from pydantic import BaseModel, GetCoreSchemaHandler


class DayThisYear(date):
    """
    Contrived example of a special type of date that
    takes an int and interprets it as a day in the current year
    """

    @classmethod
    def __get_pydantic_core_schema__(
        cls, source: Type[Any], handler: GetCoreSchemaHandler
    ) -> core_schema.CoreSchema:
        return core_schema.no_info_after_validator_function(
            cls.validate,
            core_schema.int_schema(),
            serialization=core_schema.format_ser_schema('%Y-%m-%d'),
        )

    @classmethod
    def validate(cls, v: int):
        return date(2023, 1, 1) + timedelta(days=v)


class FooModel(BaseModel):
    date: DayThisYear


m = FooModel(date=300)
print(m.model_dump_json())
#> {"date":"2023-10-28"}

`BaseModel`, `dataclasses`, `TypedDict` 的子类

When using fields whose annotations are themselves struct-like types (e.g., BaseModel subclasses, dataclasses, etc.), the default behavior is to serialize the attribute value as though it was an instance of the annotated type, even if it is a subclass. More specifically, only the fields from the annotated type will be included in the dumped object:

from pydantic import BaseModel

class User(BaseModel):
    name: str

class UserLogin(User):
    password: str

class OuterModel(BaseModel):
    user: User

user = UserLogin(name='pydantic', password='hunter2')

m = OuterModel(user=user)
print(m)
#> user=UserLogin(name='pydantic', password='hunter2')
print(m.model_dump())  # note: the password field is not included
#> {'user': {'name': 'pydantic'}}

`pickle.dumps(model)`

Pydantic models support efficient pickling and unpickling.

import pickle

from pydantic import BaseModel


class FooBarModel(BaseModel):
    a: str
    b: int


m = FooBarModel(a='hello', b=123)
print(m)
#> a='hello' b=123
data = pickle.dumps(m)
print(data[:20])
#> b'\x80\x04\x95\x95\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main_'
m2 = pickle.loads(data)
print(m2)
#> a='hello' b=123

include 和 exclude 进阶

The model_dump and model_dump_json methods support include and exclude arguments which can either be sets or dictionaries. This allows nested selection of which fields to export:

from pydantic import BaseModel, SecretStr


class User(BaseModel):
    id: int
    username: str
    password: SecretStr


class Transaction(BaseModel):
    id: str
    user: User
    value: int


t = Transaction(
    id='1234567890',
    user=User(id=42, username='JohnDoe', password='hashedpassword'),
    value=9876543210,
)

# using a set:
print(t.model_dump(exclude={'user', 'value'}))
#> {'id': '1234567890'}

# using a dict:
print(t.model_dump(exclude={'user': {'username', 'password'}, 'value': True}))
#> {'id': '1234567890', 'user': {'id': 42}}

print(t.model_dump(include={'id': True, 'user': {'id'}}))
#> {'id': '1234567890', 'user': {'id': 42}}

The True indicates that we want to exclude or include an entire key, just as if we included it in a set. This can be done at any depth level.

model 或字段级别的 include 和 exclude

我们还可以直接将 exclude: bool 传入 Field 中

(Field(..., exclude=True)) 的优先级比 exclude/include on model_dump / model_dump_json 更高

from pydantic import BaseModel, Field, SecretStr


class User(BaseModel):
    id: int
    username: str
    password: SecretStr = Field(..., exclude=True)

class Transaction(BaseModel):
    id: str
    value: int = Field(exclude=True)


t = Transaction(
    id='1234567890',
    value=9876543210,
)

print(t.model_dump())
#> {'id': '1234567890'}
print(t.model_dump(include={'id': True, 'value': True}))  # 优先级低，没用
#> {'id': '1234567890'}

但是捏, setting exclude on the field constructor (Field(..., exclude=True)) 的优先级旧没有 exclude_unset, exclude_none, and exclude_default parameters on model_dump and model_dump_json 来的高了

from pydantic import BaseModel, Field


class Person(BaseModel):
    name: str
    age: int | None = Field(None, exclude=False)


person = Person(name='Jeremy')

print(person.model_dump())
#> {'name': 'Jeremy', 'age': None}
print(person.model_dump(exclude_none=True))  
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_unset=True))  
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_defaults=True))  
#> {'name': 'Jeremy'}

在序列化时传递上下文

You can pass a context object to the serialization methods which can be accessed from the info argument to decorated serializer functions. 如果你想在运行时期间动态更新序列化行为的话，这会很有用。For example, if you wanted a field to be dumped depending on a dynamically controllable set of allowed values, this could be done by passing the allowed values by context:

from pydantic import BaseModel, SerializationInfo, field_serializer


class Model(BaseModel):
    text: str

    @field_serializer('text')
    def remove_stopwords(self, v: str, info: SerializationInfo):
        context = info.context
        if context:
            stopwords = context.get('stopwords', set())
            v = ' '.join(w for w in v.split() if w.lower() not in stopwords)
        return v


model = Model.model_construct(**{'text': 'This is an example document'})
print(model.model_dump())  # no context
#> {'text': 'This is an example document'}
print(model.model_dump(context={'stopwords': ['this', 'is', 'an']}))
#> {'text': 'example document'}
print(model.model_dump(context={'stopwords': ['document']}))
#> {'text': 'This is an example'}

`model_copy(...)`

model_copy() allows models to be duplicated (with optional updates), which is particularly useful when working with frozen models.

from pydantic import BaseModel


class BarModel(BaseModel):
    whatever: int


class FooBarModel(BaseModel):
    banana: float
    foo: str
    bar: BarModel


m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(m.model_copy(update={'banana': 0}))
#> banana=0 foo='hello' bar=BarModel(whatever=123)
print(id(m.bar) == id(m.model_copy().bar))
#> True
# normal copy gives the same object reference for bar
print(id(m.bar) == id(m.model_copy(deep=True).bar))
#> False
# deep copy gives a new object reference for `bar`

09. Serialization

model.model_dump(...)​

model.model_dump_json(...)​

常用参数​

dict(model) 与迭代​

自定义序列化行为​

篡改 model_dump 的返回值类型​

子类的序列化​

标准类型的子类​

BaseModel, dataclasses, TypedDict 的子类​

pickle.dumps(model)​

include 和 exclude 进阶​

model 或字段级别的 include 和 exclude​

在序列化时传递上下文​

model_copy(...)​